Skip to content

add inplace logic into new_executor #35618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 15, 2021
Merged

add inplace logic into new_executor #35618

merged 6 commits into from
Sep 15, 2021

Conversation

wanghuancoder
Copy link
Contributor

@wanghuancoder wanghuancoder commented Sep 9, 2021

PR types

New features

PR changes

Others

Describe

添加inplace策略,inplace策略后,因为执行mutable_data的次数变少。显存有所下降,速度有所提升。

  • 使用standalone_executor_test,PTB模型,BatchSize=20测试。添加GC机制(且FLAGS_eager_delete_tensor_gb=0.1)前后的运行时间为23.4s20.4
  • 同样的测试,inplace前后的显存使用量为:1150M1086M

@paddle-bot-old
Copy link

paddle-bot-old bot commented Sep 9, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

}
}

} // namespace
Copy link
Contributor

@Aurelius84 Aurelius84 Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetTensorFromVar 和 GetMutableTensorFromVar 在 paddle/fluid/framework/details/share_tensor_buffer_functor.cc里都有定义,这两个函数可以抽离到框架层面的utils里复用?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done,thx!

@@ -243,6 +304,16 @@ void InterpreterCore::RunInstruction(const Instruction& instr_node) {
instr_node.kernel_func_.operator_base_)
->InferShape(instr_node.infershape_ctx_.get());

if (FLAGS_new_executor_use_inplace) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

底层share了buffer之后,Variable的ref count也要增加; 因为out和input share了数据,如果out被其他op使用,这个inpute的数据就不能够提前释放

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

底层share了buffer之后,Variable的ref count也要增加; 因为out和input share了数据,如果out被其他op使用,这个inpute的数据就不能够提前释放

这个不会的。share在Instruction前执行,此时In、Out分别持有share_ptr的holder。Instruction执行后,In交给GC后,GC只会减去share_ptr的一个RefCount,Out还能够继续正常持有holder。

@wanghuancoder wanghuancoder merged commit bd79ae0 into PaddlePaddle:develop Sep 15, 2021
AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 29, 2021
* add inplace logic into new_executor, test=develop

* check shape and add inplace FLAGS, test=develop

* refine, test=develop

* refine, test=develop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants